
Does the usage of an online EFL workbook 
conform to Benford’s law? 

Mikolaj Olszewski 1 , Kacper Lodzikowski 2 , Jan Zwolinski 3 , 
Rasil Wamakulasooriya 4 , and Adam Black 5 


Abstract. The aim of this paper is to explore if English as a Foreign Language (EFL) 
learners’ usage of an online workbook follows Benford’s law, which predicts the 
frequency of leading digits in numbers describing natural phenomena. According 
to Benford (1938), one can predict the frequency distribution of leading digits in 
numbers describing natural datasets, e.g. river lengths. In such numbers, the digit 
1 occurs most frequently, while the digit 9 occurs least-frequently. This counter- 
intuitive phenomenon attracted the attention of researchers seeking inconsistencies 
in data, e.g. false tax claims (Miller, 2015). We show that the practical application of 
Benford’s law could extend to detecting abnormal learner behaviour in online EFL 
products. First, we show that the distributions of leading digits of the number of 
online activities submitted by EFL learners on an e-leaming platform and the time 
spent on those activities do indeed follow Benford’s law. Then, we show that some 
learners whose behaviour does not conform to Benford’s law show online behaviour 
that is abnormal relative to their peers - in particular, they submit many activities in 
a few days, which could suggest, for example, poor time management. 

Keywords: Benford’s law, EFL, e-leaming, time on task. 


1. Introduction 

Benford (1938) stated that it is possible to predict the frequency distribution of 
leading digits in numbers composed of four or more digits describing such natural 


1. Pearson IOKI, Poznan, Poland; mikolaj.olszewski@pearson.com 

2. Pearson IOKI, Poznan, Poland; kacper.lodzikowski@pearson.com 

3. Pearson IOKI, Poznan, Poland; jan.zwolinski@pearson.com 

4. Pearson PLC, Boston, United States; rasil.wamakulasooriya@pearson.com 

5. Pearson PLC during this research, now at Macmillan Learning, New York City, United States; adam.black@macmillan.com 

How to cite this article: Olszewski, M., Lodzikowski, K., Zwolinski, J., Wamakulasooriya, R., & Black, A. (2016). Does the 
usage of an online EFL workbook conform to Benford’s law? In S. Papadima-Sophocleous, L. Bradley, & S. Thouesny (Eds), 
CALL communities and culture - short papers from EUROCALL 2016 (pp. 351-357). Research-publishing.net. https://doi. 
org/10.14705/rpnet.2016.eurocall20 16.587 


©2016 Mikolaj Olszewski, Kacper Lodzikowski, Jan Zwolinski et al. (CC BY-NC-ND 4.0) 


351 


Mikolaj Olszewski, Kacper todzikowski, Jan Zwolinski et al. 


datasets as river lengths or city populations. In such numbers, the digit 1 is expected 
to be the most frequently occurring leading digit (about 30% of cases), while the 
digit 9 is expected to occur least- frequently (fewer than about 5% of cases), even 
though the chance of occurrence is intuitively expected to be the same for all 
leading digits. In recent years, Benford’s law attracted the attention of researchers 
because of its practical use, e.g. identifying tax or vote frauds (Miller, 2015). 

In education, Benford’s law has been applied to evaluating the chance of picking 
the correct answer among distractors on a multiple-choice test (Slepkov, Ironside, 
& DiBattista, 2015). We know of no previous work exploring the application of 
Benford’s law to e-learning of EFL, hence the present study. 


2. Method 


2.1. Data 

According to Nigrini (2012, pp. 21-22), numbers in a dataset are expected to 
conform to Benford’s law if they describe natural events or facts such as city 
populations (rather than, say, computer-generated bank account numbers) and if 
the dataset has no inherent limit (which excludes, say, exam scores). 

We focused on the number of online learning activities completed by EFL learners 
and the time spent on those activities. The data comes from MyEnglishLab for 
Speakout Pre-intermediate 1st edition (henceforth ‘MyEnglishLab’), an e-learning 
platform with exercises accompanying a textbook. The online activities comprise 
twelve units, each of which contain about thirty activities. The platform is aimed at 
institutions, so most learners analysed here were enrolled in a course set up by their 
teacher or instructor. The anonymised dataset contains 3,218,624 first attempts 
of MyEnglishLab activities from 35,265 learners from 18 different countries 
(speaking 12 different languages). 

2.2. Analysis 

To see if the number of MyEnglishLab activities completed by learners conforms to 
Benford’s law, we counted the total number of activities submitted (i.e. attempted) 
daily by each learner. Days with no learner activity were not included. Resubmitting 
the same activity did not increase the count. For example: if learner A submits 11 
activities on Monday and three activities on Tuesday, and Learner B submits four 


352 


Does the usage of an online EFL workbook conform to Benford’s law? 


activities on Tuesday and six on Wednesday, the dataset contains the observations 
{11, 3, 4, 6}. The frequency distribution of the leading digits of these measurements 
was plotted and compared with the expected trend according to Benford’s law. 

To see if time spent on those activities conforms to Benford’s law, we listed the time 
(in seconds) that every learner spent on every first submission of a MyEnglishLab 
activity. Again, the frequency distribution of the leading digits was plotted and 
compared with Benford’s distribution. We ran a Pearson’s Chi-squared Goodness- 
of-Fit test, which is one of several tests used to evaluate if a dataset conforms 
to Benford’s law. Of several such tests available in the BenfordTests R package 
(Version 1.2.0; Joenssen, 2015), this one was the fastest. Data processing and 
visualisation were performed in R (Version 3.2.4; R Core Team, 2016) running in 
RStudio (Version 0.99.893; RStudio, 2016). 


3. Discussion 

A visual inspection of Figure 1 shows that the distribution of leading digits of 
the number of activities submitted daily per learner on MyEnglishFab follows the 
Benford’s law curve closely, with the exception of the digit 1. This means there 
were more cases of learners submitting either one or between 11 and 19 activities 
per day than predicted by Benford’s law. Despite this, it could be stated that the 
number of submitted activities (roughly) conforms to Benford’s law. 

Figure 1. Distribution of leading digits of the number of submitted activities 
compared to Benford’s distribution 



Leading digit of number of activities submitted daily by learners 


A visual inspection of Figure 2 shows that the distribution of leading digits of time 
spent on single activities submitted on MyEnglishFab also closely follows Benford’s 


353 


Mikolaj Olszewski, Kacper todzikowski, Jan Zwolinski et al. 


distribution. Although the digit 1 is an exception again, the fit is better. A similar 
result was observed for the first two leading digits of time (not shown in this figure). 


Figure 2. Distribution of leading digits of time spent on MyEnglishLab activities 
compared to Benford’s distribution 



Figure 3 shows learners whose behaviour does not conform to Benford’s law. Each 
thin line represents a learner. Pearson’s Chi-squared Goodness-of-Fit test showed 
that of 12,427 learners who submitted at least 100 MyEnglishLab activities, time 
on task follows Benford’s law for 74% of learners and does not follow Benford’s 
law for 26% of learners (a = 0.05). 

Figure 3. Distribution of leading digits of time spent on MyEnglishLab activities 
by learners whose behaviour does not conform to Benford’s law 


■ Prediction from Benford's law 

» Learner behaviour on MyEnglishLab (not conforming to Benford's law) 



Leading digit of time (s) spent on activity 


While exploring backend logs of learner interactions with MyEnglishLab, we 
noticed that some of the 26% of learners whose behaviour does not conform to 


354 


Does the usage of an online EFL workbook conform to Benford’s law? 


Benford’s law share three characteristics. First, even if they were enrolled in a 
course that lasted a couple of months, they used the platform to submit exercises 
only for a couple of days. Second, on those few days of activity, the learners 
submitted an unusually high number of activities, often receiving high scores. 
Third, learners seemed to have worked with these activities simultaneously, i.e. they 
opened one activity after another in quick succession (probably in separate browser 
tabs although front-end interactions such as browser focus were not tracked here) 
and then, after some time, quickly submitted one activity after another. This could 
be an indication of cramming. 

Figure 4 shows an example of one such learner. This learner took part in what 
seemed to have been an intensive two-month course, judging by the online activity 
of other participants in that course. While other learners in the course submitted 
activities relatively frequently, this learner submitted 195 activities on three 
different days (within a span of 10 days), scoring -96% per activity, on average. 
On each such day, the learner opened a number of activities almost at once, spent 
more time on each following activity, and then submitted them all almost at once. 
This happened towards the end of the course, so completing online activities might 
have been a course requirement. 

Figure 4. Distribution of leading digits of time spent on MyEnglishLab activities 
by a learner whose behaviour does not conform to Benford’s law 



4. Conclusions 

We showed that the distributions of leading digits of the number of online 
activities completed by EFL learners and the time spent on those activities closely 


355 


Mikolaj Olszewski, Kacper todzikowski, Jan Zwolinski et al. 


follow Benford’s law. The approach used in this paper shows how insights can be 
revealed in noisy online data, such as the time data, which the standard methods 
of analysis would not reveal. 

Benford’s law has been applied for tax fraud detection and our results show that it 
may also be worth applying it for detection of abnormal learner behaviour. Whereas 
we do not know if the learners whose behaviour did not conform to Benford’s law 
in this particular study behaved so because of poor time management skills or other 
factors, Benford’s law could help flag such learners to teachers who would then 
choose the best course of intervention by talking to learners. 

Still, our findings are directional and future research should focus on validating 
such an approach, and its usefulness to teachers. Another strand of research 
could focus on comparing the computational performance of different tests 
for evaluating conformity to Benford’s law (with operationalising large-scale 
detection in mind) and comparing this approach to other methods of detection 
and prediction of learner performance, e.g. those that rely on simpler metrics, 
such as login frequency. 


5. Acknowledgements 

We thank Daniel Roe, Category Director of Pearson English, for granting us 
permission to share the findings broadly. We also thank Claire Masson and the 
Pearson English MyEnglishLab team. 


References 

Benford, E (1938). The law of anomalous numbers. Proceedings of the American Philosophical 
Society, 78(4), 551-572. http://www.jstor.org/stable/984802 
Joenssen, D. W. (2015). BenfordTests: statistical tests for evaluating conformity to Benford’s law 
[Computer software]. http://CRAN.R-project.org/package=BenfordTests 
Miller, S. J. (Ed.). (2015). Benford’s law: theory and applications. New Jersey: Princeton 
University Press, https://doi.org/10.1515/9781400866595 
Nigrini, M. J. (2012). Benford’s law: applications for forensic accounting, auditing, and fraud 
detection. Hoboken: Wiley, https://doi.org/10.1002/9781119203094 
R Core Team. (2016). The R project for statistical computing [Computer software], https:// 
www.R-proj ect.org 

RStudio. (2016). Integrated Development for R [Computer software], http://www.rstudio.com 


356 


Does the usage of an online EFL workbook conform to Benford’s law? 


Slepkov, A. D., Ironside, K. B., & DiBattista, D. (2015). Benford’s law: textbook exercises 
and multiple-choice testbanks. PLoS ONE, 10(2), 1-13. https://doi.org/10.1371/journal. 
pone.0117972 


357 



search-publishing.net 


Published by Research-publishing.net, not-for-profit association 
Dublin, Ireland; Voillans, France, info@research-publishing.net 

© 2016 by Editors (collective work) 

©2016 by Authors (individual work) 

CALL communities and culture - short papers from EUROCALL 2016 
Edited by Salomi Papadima-Sophocleous, Linda Bradley, and Sylvie Thouesny 

Rights: All articles in this collection are published under the Attribution-NonCommercial -NoDerivatives 4.0 International 
(CC BY-NC-ND 4.0) licence. Under this licence, the contents are freely available online as PDF files (https ://doi. 
org/10.14705/rpnet.2016.EUROCALL2016.9781908416445) for anybody to read, download, copy, and redistribute 
provided that the author(s), editorial team, and publisher are properly cited. Commercial use and derivative works are, 
however, not permitted. 



Disclaimer: Research-publishing.net does not take any responsibility for the content of the pages written by the authors of 
this book. The authors have recognised that the work described was not published before, or that it is not under consideration 
for publication elsewhere. While the information in this book are believed to be true and accurate on the date of its going 
to press, neither the editorial team, nor the publisher can accept any legal responsibility for any errors or omissions that 
may be made. The publisher makes no warranty, expressed or implied, with respect to the material contained herein. While 
Research-publishing.net is committed to publishing works of integrity, the words are the authors’ alone. 

Trademark notice: product or corporate names may be trademarks or registered trademarks, and are used only for 
identification and explanation without intent to infringe. 

Copyrighted material: every effort has been made by the editorial team to trace copyright holders and to obtain their 
permission for the use of copyrighted material in this book. In the event of errors or omissions, please notify the publisher of 
any corrections that will need to be incorporated in future editions of this book. 

Typeset by Research-publishing.net 

Cover design by © Easy Conferences, info@easyconferences.eu, www.easyconferences.eu 
Cover layout by © Raphael Savina (raphael@savina.net) 

Photo “bridge” on cover by © Andriy Markov/Shutterstock 
Photo “frog” on cover by © Fany Savina (fany.savina@gmail.com) 

Fonts used are licensed under a SIL Open Font License 

ISBN13: 978-1-908416-43-8 (Paperback - Print on demand, black and white) 

Print on demand technology is a high-quality, innovative and ecological printing method; with which the book is never ‘out 
of stock’ or ‘out of print’. 

ISBN13: 978-1-908416-44-5 (Ebook, PDF, colour) 

ISBN13: 978-1-908416-45-2 (Ebook, EPUB, colour) 

Legal deposit, Ireland: The National Library of Ireland, The Library of Trinity College, The Library of the University of 
Limerick, The Library of Dublin City University, The Library of NUI Cork, The Library of NUI Maynooth, The Library of 
University College Dublin, The Library of NUI Galway. 

Legal deposit, United Kingdom: The British Library. 

British Library Cataloguing-in-Publication Data. 

A cataloguing record for this book is available from the British Library. 

Legal deposit, France: Bibliotheque Nationale de France - Depot legal: decembre 2016. 


